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1 The ABC-PRC Algorithm 

A sequential Monte Carlo method for performing approximate Bayesian Computation 
("Monte Carlo without likelihoods") has been proposed by Sissons, Fan and Tanaka 
(PNAS, 2007). The main algorithm that is used in their paper is given here in Fig- 
ure [Tj In this algorithm N is the number of particles 9 is a vector of parameters, the 
e t are tolerances such that if p(S(x),S(xo)) < e the simulated summary statistics, S(x) 
are considered 'near enough' to the target summary statistics S(xq), where pQ is some 
distance function. The authors state 

Samples {0j}} are weighted samples from the posterior distribution f(9\p(S(x),X(xo)) < 
<-)■ 

They also state 

Finally we note that if Kt{0t\0t-\) = Lt-i(0t-i\0t), Pi(9) = an d the prior 

n(8) oc 1 over G, then all weights are equal throughout the sampling process and 
maybe safely ignored... 

2 Problematic Aspects 



Intuitively the simplified algorithm, which arises from making the assumptions in the 
latter statement above, is rather puzzling because it is unclear what corrects for the fact 
that one is sampling from a distribution that, if the transition kernel variance, K t (9 t \9 t _i), 
is small, is progressively moving away from the prior. 

In the algorithm given in Figure [T] there is the statement: if p{S(x * *),S(xq)) > e», 
then go to PRC2.1. Intuitively this makes the algorithm superficially appear as N parallel 
MCMC-ABC chains, with the exception that there is resampling among the chains at each 
iteration. Certainly if one takes the special case of AT = 1 then it looks similar to the 
ABC-MCMC algorithm of Marjoram et al. (2003). However a crucial difference is that 
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ABC-PRC Algorithm 

PRC1. Initialize ei, . . . , er, and specify initial sampling distri- 
bution m. 

Set population indicator t = 1. 
PRC2. Set particle indicator i = 1. 

PRC2.1. If / = 1, sample 9** ~ independently from ui. 
If / > 1, sample©* from the previous population {$-1} 
with weights If^ 1 !,). and perturb the particle to 0** ~ 
Kt(6\0*) according to a Markov transition kernel K t . 
Generate a data set*** ~ f(x\8**). 
If p(S(x**), S(x )) a e„ then go to PRC2.1. 
PRC2.2. Set 

[ irie'^/nM) if r = 1 

ei" = and j^ ;> = *'«;•'''/•, <>/■> iU>l , 
[ TT(8*)K t (ef\d*) 

where ir(6) denotes the prior distribution for 0, and 
L, - i is a backward transition kernel. 

If i < N, increment / = i + 1 and go to PRC2.1. 
PRC3. Normalize the weights so that I,? =1 Wp = 1. 

If ESS = [J.f =1 (W^) 2 \- 1 < E then resample with 
replacement, the particles {(!}''} with weig hts {Wf >} to 
obtain a new population {0}'-}. and set weights { = 
l/N}. 

PRC4. If / < T, increment t = t + 1 and go to PRC2. 



Figure 1: The ABC-PRC algorithm of Sissons et ai, PNAS, 2007 



in the ABC-MCMC the current value is the old value if the update is not accepted, 
otherwise it is the new value, whereas in this SMC algorithm you keep going until you get 
an acceptance. So in the ABC-MCMC algorithm, you are guaranteed that if the point 
you update is already from the posterior distribution, the next point will also be from 
the posterior (as given by the proof, based on satisfying detailed balance, in Marjoram 
et al, 2003, page 15325), whereas in the SMC this is not the case. To see this, imagine 
that the kernel chosen has a very small variance (close to 0, in fact). In the ABC-MCMC, 
following the proof of detailed balance in Marjoram et al (2003), we are guaranteed that 
if $i is drawn from the posterior, then is also drawn from the posterior (whatever 
variance of the kernel is chosen). In the SMC, for example, imagine that one has only two 
resampling steps, and imagine that a sample is taken with e close to 0, and kernel variance 
close to 0. The first step gives you (almost) the posterior distribution , 9ip(9\x = x c ). 
The next step, since the prior is now actually the posterior, is exactly like performing 
standard rejection with such a prior, and gives you (almost) [p(#|a; = x )} 2 ), the square of 
the posterior distribution, and so on, progressively. Thus, initially at least, the posterior 
distribution generated at the ith step is an increasingly poor estimate of [p(#|a;)] 1 ). 
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3 Examples 

As a toy example consider the case of computing the posterior distribution of the para- 
metric mean, fi of a Gaussian distribution with a known variance a 2 , given a vector of n 
observations y. The prior for // is taken to be Gaussian with mean /io and variance r 2 . 
The posterior p(fi\y) is then given as 




where y is the arithmetic mean of the elements of y, is a sufficient statistic for this problem, 
and is used as the only statistic for the ABC analyses described here. In order to test the 
simplified version of the ABC-PRC algorithm above (although the argument applies also 
to the more complicated version), I let r 2 — > oo so that a flat improper prior is assumed, 
in which case one obtains 




Figure [2] indicates the application of the ABC-PRC algorithm of Sissons et al. (2007) 
to data with y = 4.786624 and n = 10. The known variance, a 2 is set at 9, and so 
the posterior variance should be 0.9, with posterior mean of 4.786624. To approximate 
a flat uniform prior, a uniform with bounds (-15,15) is used for the initial sample. The 
ABC-PRC algorithm was run for 100 iterations. The schedule of tolerances is given below: 
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The method converges smoothly, as indicated in Figure [3], but to a variance that is too 
small. The true posterior variance is 0.9. The posterior variance to which the ABC-PRC 
method appears to converge is around 0.26. 

Using an even narrower kernel of 0.01, we can see even greater discrepancies, as il- 
lustrated in Figures H] and El In this case the posterior variance to which the method 
converges is around 0.094, almost one-tenth of the correct value. 

Increasing the variance of the kernel to 1, we see that the estimated posterior variance, 
now around 0.59, becomes closer to the true value of 0.9 (Figure [7]), and, superficially, the 
posterior distribution looks similar to the theoretical distribution (Figure [61 
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Figure 2: Application of the ABC-PRC algorithm with kernel variance of 0.1 



Finally with the variance of the kernel at 10, the estimated posterior variance is 
around 0.84 (Figure[H]), still lower than the theoretical value, and the posterior distribution 
appears very similar to the theoretical distribution (Figure E]). 



4 Solution 

These results suggest that a good way to look at the ABC-PRC algorithm of Sisson et 
al. (2007) is as a successive series of applications of the rejection algorithm to random 
variables drawn from a prior distribution that is a convolution of the smoothing kerneal 
and the realized variables from the previous rejection round. If this kernel is large relative 
to the posterior distribution the method will appear to work because it is similar to 
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Convergence of posterior variance when kernel variance is 0.1 
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Figure 3: Application of the ABC-PRC algorithm with kernel variance of 0.1. The hori- 
zontal line shows the theoretical value. 

drawing the variables from a uniform prior. However it can be seen that with a lower 
kernel variance, the posterior distribution is badly estimated. Viewed in this way, it is 
clear that the weights do matter, and the appropriate correction, at least for the Gaussian 
kernel used here, is to compute the weights, W^),i for the ith particle, from the reciprocal 
of the (unnormalised) kernel density estimate as 



Wt 



(t),i 



and variance of the kernel that 



where | ^t-yj, £ 2 ) is a Gaussian with mean 0(t-i)j 

is used, £ 2 . 

Figures fTUl and fTTl show the results of applying this corrected version of the ABC-PRC 
algorithm to the example with a kernel of variance 0.01, illustrated in Figures H] and [5j 
In this case the estimate variance converges to around 0.88, which compares favourably 
with the uncorrected ABC-PRC algorithm even when a large kernel is used. 

In conclusion, it would appear that the ABC-PRC algorithm of Sissons et al. is wrong 
and should not be used. It can, however, be corrected by the addition of a computationally 
trivial weighting scheme. 
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Figure 4: Application of the ABC-PRC algorithm with kernel variance of 0.01 
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Convergence of posterior variance when kernel variance is 0.01 
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Figure 5: Application of the ABC-PRC algorithm with kernel variance of 0.01. The 
horizontal line shows the theoretical value. 
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Figure 6: Application of the ABC-PRC algorithm with kernel variance of 1 
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Convergence of posterior variance when kernel variance is 1 
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Figure 7: Application of the ABC-PRC algorithm with kernel variance of 1. The hori- 
zontal line shows the theoretical value. 
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Figure 8: Application of the ABC-PRC algorithm with kernel variance of 10 
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Convergence of posterior variance when kernel variance is 10 
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Figure 9: Application of the ABC-PRC algorithm with kernel variance of 10. The hori- 
zontal line shows the theoretical value. 
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Figure 10: Application of the corrected ABC-PRC algorithm with kernel variance of 0.01. 
Compare with the uncorrected version in Figure H] 
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Convergence of posterior variance when kernel variance is 0.01 
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Figure 11: Application of the corrected ABC-PRC algorithm with kernel variance of 0.01. 
The horizontal line shows the theoretical value. Compare with the uncorrected version in 
Figure 
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5 Appendix 



The R Function for performing the original ABC-PRC 

abc . smcla <- f unction (npart , niter ,unif . lo ,unif .hi ,y , sigma2 ,eps , kern .mean, kern. var) 
{ 

# npart is the number of particles 

# niter is the number of smc iterations 

# unif.lo is the lower limit of the uniform distribution 

# unif.hi is the upper limit of the uniform distribution 

# y is the real data - we compute the only summary stat - the sample mean - from this 

# sigma2 is the known variance. 

# eps is the vector of tolerances - the euclidean distance must be less than this 

if (length(eps) != niter)stop("eps is wrong length") 
if (niter < 20)nscore <- niter 
else nscore <- 20 

p. history <- matrix (nrow=nscore ,ncol=npart) 
pt <- round (seqCl , niter , lengthens core) ) 
nsamp <- length(y) 
ymean <- mean(y) 

vl <- runif (npart ,unif . lo.unif . hi) 
k <- 1 

f or (j in 1 : niter) { 
vx <- numeric (0) 
while (T){ 

vv <- sample (vl , 100*npart , replace=T) 

ssvec <- lapply (lapply (vv,rnorm,n=nsamp, sd=sqrt (sigma2) ) ,mean) 

ind <- sqrt ( (as .numeric (ssvec) - ymean)~2) <= eps[j] 

if(sum(ind) == 0)next; 

vx <- c(vx,vv[ind] ) 

if (length(vx) >= npart){ 

vx <- vx[l:npart] 

break; 

> 

} 

ifCptM =- jH 

p .history [k,] <- vx; 
k <- k+1 

} 

vl <- vx + rnorm(npart , kern. mean, sqrt (kern . var) ) 

> 

p. history 

The R function for performing the corrected ABC-PRC 

abc . smcla. correct <- function (npart , niter ,unif . lo ,unif .hi , y , sigma2 , eps , kern. me an, kern. var) 

{ 

# npart is the number of particles 

# niter is the number of smc iterations 

# unif.lo is the lower limit of the uniform distribution 

# unif.hi is the upper limit of the uniform distribution 

# y is the real data - we compute the only summary stat - the sample mean - from this 

# sigma2 is the known variance. 

# eps is the vector of tolerance - the euclidean distance must be less than this 

if (length(eps) != niter) stop ("eps is wrong length") 
if (niter < 20)nscore <- niter 
else nscore <- 20 

p. history <- matrix (nrow=nscore ,ncol=npart) 
pt <- round (seq(l , niter , lengthens core) ) 
nsamp <- length(y) 
ymean <- mean(y) 

vl <- runif (npart ,unif . lo, unif . hi) 
wtvec <- rep (1, npart) 
k <- 1 

f or(j in 1 : niter) { 
vx <- numeric(O) 
while (T){ 

vv <- sample (vl , 100*npart , replace=T, prob=wtvec) 

ssvec <- lapply (lapply (vv, rnorm,n=ns amp, sd=sqrt (sigma2) ) ,mean) 

ind <- sqrt ( (as .numeric (ssvec) - ymean)~2) <= eps[j] 

if(sum(ind) == 0)next; 

vx <- c(vx,vv[ind] ) 

if (length(vx) >= npart){ 

vx <- vx[l:npart] 

break; 

> 

} 

ifCptM -= jH 

p. history tk,] <- vx; 
k <- k+1 

> 

vl <- vx + rnorm(npart , kern. mean, sqrt (kern . var) ) 
f or (j j in 1 : npart) wtvec [j j] <- 1.0/ sum(dnorm(vl [j j ] , vx , sqrt (kern. var) ) ) 

} 

p. history 
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